RECONFIGURABLE INTEGRATED CIRCUIT 

BACKGROUND OF THE INVENTION 

In today's competitive multimedia marketplace Integrated Circuit (IC) suppliers, 
Original Equipment Manufacturer (OEMs) and network/service providers are faced with an 
array of dilemmas. Functional integration, dramatic increases in complexity, new 
technologies and every changing and competing standards together with increased time to 
market pressures are making the selection of the right functionality-cost mix ever more 
difficult. Furthermore, end customers are demanding more sophisticated feature sets, which 
in turn require an enormous amount of additional processing power. 

The constant introduction of new standards means conventional equipment is 
effectively obsolete before it leaves the factory. This is a particular concern to 
network/service providers, such cable, satellite, terrestrial television providers and mobile 
phone operators as they significantly subsidize the cost of this equipment to the consumer. 
Consequently, the introduction of new equipment erodes their profits. Therefore, having 
equipment that could adapt to changing standards, upgrades and new applications via the 
Internet and or broadcast channel would be a significant advantage. 

To further compound the issue the introduction of new European environmental 
legislation in 2004 will make OEMs responsible for waste management. Waste of Electrical 
and Electronic Equipment (WEEE) and Restrictions of the use of certain Hazardous 
Substances (RoHS) legislation will mean manufacturers of consumer goods will need to 
adopt a more environmentally friendly manufacturing strategy. They will also be responsible 
for product recycling. 

At the IC device level, it is becoming increasing difficult with existing IC 
technologies and design methodologies for designers to meet the demands outlined above. 
Several IC technologies exist, but they all have disadvantages and are not optimised for a 
particular application. 
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Application Specific Integrated Circuits (ASICs) have their circuits and hence their 
functionality fixed at manufacture and so can't be used for new or different applications. 
They have long development cycles and require huge upfront Non-Recurring Engineering 
(NRE) costs. This makes them prohibitively expensive, especially for lower cost applications. 

Microprocessors and Digital Signal Processors (DSPs) provide a degree of flexibility 
with regards reconfiguration through software. However, these devices still employ fixed or 
rigid hardware and as they are general purpose devices are not optimised to a particular 
application. This is particularly true when compared to a parallel hardware solution. A 
microprocessor can only process one instruction at a time and is therefore much slower and 
inefficient. While operating, many of their circuits are not being utilized. This is a waste of 
expensive silicon real estate and increases power consumption. To increase the throughput, 
designers can employ more than one processor. However, this just compounds the cost, 
power efficiency and area issues. 

Current programmable logic devices, such as Field Programmable Gate Arrays 
(FPGAs), provide a better solution. However, FPGAs are very expensive and are a general- 
purpose device consisting of an array of uniform programmable element, usually based on 
look-up tables (LUTs) interconnected using programmable interconnect. Consequently, they 
are not optimised for a particular application and hardware utilization can be poor. Though 
they allow reconfiguration in the field the process is slow and cumbersome and doesn't allow 
real-time reconfiguration. 

Many multimedia processes require several complex digital signal-processing 
algorithms. Each algorithm itself comprises of many sub-functions some of which can be 
executed in parallel. Some of these sub-functions or processes, such as digital filtering, 
convolution, Fast Fourier Transforms (FFTs), Discrete Cosine Transforms (DCTs), require 
many arithmetic and logical computations per data sample. These arithmetic and logical 
computation operations tend to be the same operation executed many times, such as multiply 
and accumulate (MAC) operations. Consequently, the hardware to implement these different 
processes is very similar and can be optimised and shared for these applications. Exploiting 
the parallel form of certain algorithms by implementing hardware to perform the separate 
parallel functions simultaneously provides hardware acceleration of the algorithm enabling it 
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to be executed in a quicker time. A goal of the present invention is to provide processing 
resources in the reconfigurable integrated circuit that can execute functions in parallel and 
provide hardware acceleration. 

Figure 2 is a logical block diagram that outlines the processing and resource 
requirements for a generic multimedia system or algorithm 100. The algorithm can be 
partitioned into several distinct functions each having its own processing and resource 
requirements. The algorithm input block 101 operates at a lower rate than the core functions 
103, but tends to require shared resources. Received data needs to be formatted or pre- 
processed 102 before being transferred to parallel algorithmic resources 103. These are 
dedicated resources, which operate at high frequencies that are many times the data sample 
rate. Data is then post processed or merged 104 before being output 105 via one or a plurality 
of output channels. These latter two functions require medium processing rates and shared 
resources. 

As well as parallel processing an algorithm may contain certain sub-flinctions that are 
performed sequentially. Each subsequent sub-function requiring data to be processed by the 
previous sub-function. In an ASIC or FPGA design each sub-function will require dedicated 
circuitry. However, by reconfiguring the available logic resources the reconfigurable logic 
can be altered in real-time to implement each of the sequential sub-functions. Consequently, 
reducing the number of logic gates and silicon real estate. It is another goal of the present 
invention to provide a reconfigurable integrated circuit, which optimises the logic resources 
for a particular application. 

Another problem facing integrated circuit designers is the choice of device interfaces. 
There are many interface standards available several of which are constantly being upgraded. 
One solution is to implement several interfaces on a device to enable it to be employed in 
several different applications. However, this is costly and inefficient especially when an 
interface requires wide address and data buses. One of the goals of the present invention is to 
provide reconfigurable logic resources to allow a designer to implement different interfaces 
using the same logic resources. 



Another goal of the present invention is to provide logic resources with varying 
degrees for reconfiguration rate. Some reconfigurable resources only need to be configured at 
the start of device operation, such as interface type, clock rate and memory sizes. Other 
algorithmic blocks implement functions, which perform operations at a rate lower than the 
maximum clock frequency used by a particular device. These algorithmic blocks tend to 
perform similar operations. Therefore, several different algorithms can be implemented by 
dynamically sharing common logic resources. 

This concept can be extended for implementing finite state machines. Figure 3 shows 
a generic block diagram of a finite state machine. The current state 906 is stored in register 
901 and is clocked using clocking signal 909. Current state 906 together with inputs 904 are 
input into the next state generation logic 900 to determine the next state 905 and actions. At 
the next clock cycle the next state vector 905 in transferred to the current state register 901. 
Likewise, any outputs are registered in register 902. In some finite state machines variables 
908 need to be updated at certain times. Variable update logic 903 is used to perform these 
calculations. The finite state machine can be reset using reset signal 910. 

The stages of operation are shown in figure 4. For each state there can be several test 
conditions. Each of these is tested 9A. Then the appropriate one is selected 9B. Based on the 
selected test condition the next state, outputs and actions are selected 9C. At the start of the 
next clock cycle the next state, outputs and actions are updated 9D. 

However, one of the problems of implementing finite state machines is that logic 
circuitry is required to perform functions associated with each state. This also means these 
individual circuits are dissipating power even if they are not being used as in an ASIC or 
FPGA implementation. For a complex state machine with many states this requires a lot of 
silicon resources. A solution to this problem is to implement the logic for each state only 
when it is required. By dynamically reconfiguring and sharing logic resources a finite state 
machine can be implemented in a smaller area with reduced power consumption. 

One of the disadvantages of using Field Programmable Gate Arrays (FPGAs) is that 
they are not optimised for a particular application due to replication of uniform 
programmable logic elements. Yet another goal of the present invention is to provide a 
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reconfigurable integrated circuit that employs non-uniform or a diverse range of rigid 
elements and programmable-rigid elements, which target a particular group of applications, 
such as audio, video and telecommunication applications. The term rigid element means a 
hardwired circuit dedicated to implementing a particular function or functions. The hardwired 
circuit can be "constructed" from one or more hardwired sub-circuits. The term 
programmable-rigid element means a circuit that contains hardwired circuitry, but certain 
parts of the circuitry can be reconfigured via memory means so the circuit can implement one 
of a plurality of similar functions. This includes a micro-coded controller. The term 
reconfigurable element refers to a block of logic that can be reconfigured to implement a 
wide variety of combinatorial and or synchronous logic functions. Though synchronous logic 
is normally employed there is no reason why asynchronous logic cannot be employed in the 
hardwired circuits used in the reconfigurable integrated circuit. 

Video processing tends to work on 8-bit data values as in MPEG2. However, audio 
applications require a greater range of bit widths. Compact Disc (CD) data was originally set 
at 16-bits. However, the sample resolution for new audio systems has changed to 18-bits, 20- 
bits and now 24-bits. In voice data systems data is coded and transmitted serially. 
Consequently, fine grain bit resolution processing is required. Therefore, a reconfigurable 
integrated circuit targeted at audio applications will need to implement both coarse and fine 
grain processing elements. 

Several attempts have been made to provide an integrated circuit device solution, 
which provides the speed of parallel hardware with the flexibility of software. However, 
these solutions have had many limitations. Some have provided replicated coarse grained 
processing elements to target particular digital signal processing problems and therefore lack 
the versatility of a full reconfigurable solution. 

For example, Marshall et al. EP0858167 (priority EP 19970300562), entitled "Field 
Programmable Processor Arrays", January 29 1997, describes a device in which processing 
units can be densely connected efficiently and in a flexible way so they can be 
interconnected. However, the processor array is made up from the same arithmetic logic units 
(ALUs) repeated many times. Each ALU is 4-bits wide and control functions seem limited. 
There are no diverse computational blocks. The device is geared to data path processing and 



in particular repetitive operations. The device has specific applications and does not provide 
functions for implementing control, interfaces, input, output, finite state machines and 
general reconfiguration operations, as required in a more general purpose device. 

Tavana et al. U.S. Pat. No. 6,094,065, entitled "Integrated Circuit with Field 
Programmable and Application Specific Logic Areas", issued Jul. 25, 2000, discloses use of 
a field programmable gate array in a parallel combination with a mask-defined application 
specific logic area. The intention is to provide post- fabrication reconfiguration logic means to 
enable bug fixes and error corrections. However, this approach is limited and suffers from the 
disadvantages associated with ASICs and FPGAs, such as low logic utilization, greater power 
consumption, low speed and high cost. 

Master et al. U.S. Pat. No.20020138716, entitled "Adaptive integrated circuitry with 
heterogeneous and reconfigurable matrices of diverse and adaptive computational units 
having rigid, application specific computational elements", issued September 26, 2002, 
describes an integrated circuit which employs rigid hardware elements which can be 
reconfigured in real time. However, there are several disadvantages to this approach. Firstly, 
each computation unit comprises several different rigid computational elements and a single 
computational unit controller. A plurality of computation units is used to form a matrix, 
which is then replicated many times to form an array of matrices. This is an inefficient use of 
hardware resources as the computational unit controller will only be using one of the plurality 
of computational elements depending on the algorithm be implemented. Therefore, the 
hardware utilization can be low. Secondly, the computational unit controller can only access 
the computational elements in its own computation unit. There is no sharing of resources by 
different computational unit controllers. Again, this is inefficient. Thirdly, the same 
computational elements and matrices are repeated across the integrated circuit to form a large 
array. There is no grading of reconfigurable resources across the integrated circuit in relation 
to the processing and resource requirements for different functions used to implement a 
system, such as input interfaces, output interfaces, parallel processing and protocol 
processing and data formatting. 

Consequently, there is a need for a reconfigurable integrated circuit that provides the 
speed of parallel hardware, as employed in an ASIC device, with the reconfigurable 
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flexibility of software for a targeted application. The reconfigurable integrated circuit will 
allow dynamic sharing of resources, both rigid and programmable-rigid, to maximise 
hardware utilization, employ different grades of processing resources depending on the 
algorithmic sub-function level within a system and be reconfigurable in both real-time and 
non real-time. These reconfigurable logic devices enable the same device to implement many 
different functions and standards in hardware. They effectively evolve with changing 
standards and so reduce obsolescence. The result is a reconfigurable integrated circuit 
solution with orders of magnitude functional density improvement over traditional integrated 
circuit solutions and one that is more efficient in terms of cost, power consumption and use 
of silicon real estate. 



SUMMARY OF THE INVENTION 



The present invention provides a reconfigurable integrated circuit comprising a 
plurality of controller elements, the plurality of controller elements including a first controller 
element and a second controller element, the first controller element having a certain 
architecture and a second controller element having a certain architecture, the first 
architecture being different from the second architecture; a plurality of processing elements, 
the plurality of processing elements including a first processing element and a second 
processing element, the first processing element having a certain architecture and a second 
processing element having a certain architecture, the first architecture being different from 
the second architecture. Reconfigurable interconnection means is used to connect and 
transfer data and control signals between processing elements. It is also used to interconnect 
processing elements and controller elements. The reconfigurable interconnection means can 
be dynamically reconfigured in real time and non real time providing different 
interconnection configurations between processing element and controller element. One or 
plurality of the controller elements can control the reconfigurable interconnect and 
implement different interconnection configurations both on a local block basis or inter-block 
basis. 



BRIEF DESCRIPTION OF THE DRAWINGS 



Figure 1 shows a logical block diagram of a reconfigurable integrated circuit having one level 
of processing block. 

Figure 2 is a logical block diagram showing the sub-functions of a generic algorithm and the 
details the processing and resources requirements employed at various stages in the 
algorithm. 

Figure 3 is a generic block diagram of a finite state machine. 

Figure 4 outline the different stages performed by a finite state machine. 

Figure 5 shows a particular type of processing block that employs shared resources. 

Figure 6 shows a particular type of processing block that employs dedicated resources. 

Figure 7 shows a logical block diagram of a particular type of shared processing element. 

Figure 8 shows the protocol format used by the processing element shown in figure 7. 

Figure 9 shows a logical block diagram of a generic reconfigurable finite state machine. 

Figure 10 shows a logical block diagram of a generic dedicated controller element and 
processing element. 

Figure 1 1 shows a logical block diagram for interconnecting different processing blocks in a 
hierarchical fashion. 

Figure 12 shows a logical block diagram for interconnecting different processing blocks in a 
fractal fashion. 

Figure 13 details a particular method of implementing the reconfigurable interconnect. 

Figure 14 details a particular method of isolating reconfigurable interconnect. 
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Figure 15 shows a logical block diagram for implementing a programmable-rigid processing 
resource. 

Figure 16 shows in part a one section of a programmable-rigid serial finite state machine 
resource. 

Figure 17 details a section of pseudo code for implementing an AC3 function. 

Figure 18 shows the corresponding data flow graph for the AC3 function. 

Figure 19 shows how various processing elements are concatenated using the reconfigurable 
interconnect to implement the AC3 function in stage[i]. 

Figure 20 details another section of pseudo code for implementing a different, but related 
AC3 function. 

Figure 21 shows the corresponding data flow graph for the second AC3 function. 

Figure 22 shows how various processing elements are concatenated using the reconfigurable 
interconnect to implement the AC3 function in stage[i+l]. 

DETAILED DESCRIPTION OF THE INVENTION 

Figure 1 shows a preferred embodiment of the present invention. The apparatus 10, 
referred to herein as a Reconfigurable Resource Core ("RRC") 10, is preferably embodied as 
an integrated circuit 1 , or as a portion of an integrated circuit having other components, such 
as memory 15 and or an embedded RISC core (not shown). The RRC 10 comprises one or a 
plurality of processing blocks 2, labelled as 2A through 2Z in figure 1 (individually and 
collectively referred to as processing blocks 2). The processing blocks 2 can communicate 
via reconfigurable interconnect 21. Specific routing selections are determined by the 
reconfigurable interconnect controllers 25. Data transferred between the processing blocks 2 
can be both control and data information. The processing blocks 2 can take on two forms 
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namely a shared resource block 20A or a dedicated resource block 20B (individually and 
collectively referred to as processing blocks 2). When implemented as an integrated circuit 1 
one or more of the processing blocks 2 can be employed as input interface circuitry 13 and or 
output interface circuitry 14. Data is transferred to the input interface 13 via input 
interconnect 133. Interface control signals 134 are used to control the flow of data. Likewise, 
data is transferred from the output interface 14 using output interconnect 143. Interface 
control signals 144 are used to control the flow of data. A master controller 16 is used to 
configure and reconfigure the processing blocks 2 and reconfigurable interconnect 21. 
Dedicated interconnect 28 provides means for a master controller 16 to communicate and 
transfer both control and data information to the various configuration memories within the 
RRC 10. The master controller 16 can write data to a reconfigurable memory and read from a 
reconfigurable memory. 

The configuration of the plurality of reconfigurable interconnects 21, reconfigurable 
interconnect controllers 25 and processing blocks 2 is performed by a master controller 16. 
However, as explained later processing block interconnect 21 can be controlled locally by 
controller elements 22. The master controller 16 can be a dedicated unit or be implemented 
from one or more reconfigurable processing blocks 2, as outlined in figure 1 . In addition, the 
master controller can be implemented by an external processing unit, such as a 
microprocessor or ASIC. Global memory means 15 can be any semiconductor memory 
means, such as RAM, ROM, SRAM, DRAM, EEPROM or FLASH memory. It can also be a 
combination of these memory technologies. The global memory 15 can be used to store data 
and configuration data for implementing different algorithms. 

As outlined above, the processing block 2 can take two forms. Figure 5 shows a 
logical block diagram of the shared resource processing block 20A. Figure 6 shows a logical 
block diagram of a dedicated resource block 20B. 

The shared resource block 20A comprises one or a plurality of controller elements 22, 
shown as controller elements 22A through 22N (individually and collectively referred to as 
controller elements 22), one or a plurality of shared processing elements 23, shown as 
processing elements 23A through 23M (individually and collectively referred to as 
processing elements 23), dedicated interconnect 29 for implementing direct connections 
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between a controller element 22 and one or more processing elements 23, a reconfigurable 
interconnect 21 which provides means to allow any controller element 22 to communicate 
with any processing element 23, and a reconfigurable interconnect controller 25 to configure 
the desired local interconnect configuration and allows communication with other processing 
blocks 2. The reconfigurable interconnect also allows communication (data transfer) between 
any of the processing elements 23. In a preferred embodiment, the reconfigurable 
interconnect 21 can be controlled and configured directly by one or a plurality of controller 
elements 22. The operation of the shared resource block 20 A will be described in more detail 
later. The reconfigurable interconnect 21 also allows the output from one processing element 
23 to be input to any other processing element 23. This allows many processing elements to 
be concatenated in different ways to form different datapath and hence algorithmic functions. 
The reconfiguring of the different processing element 23 concatenation configurations can be 
changed on a cycle-by-cycle basis. 

The dedicated resource block 20B comprises one or a plurality of dedicated elements 
26, shown as dedicated elements 26A through 26M (individually and collectively referred to 
as dedicated elements 26), a reconfigurable interconnect 21 which provides means to allow 
any dedicated element 26 to communicate with any other dedicated element 26 within the 
same processing block 2, and a reconfigurable interconnect controller 25 to configure the 
desired interconnect configuration and allows communication with other processing blocks 2. 
Each dedicated element 26 further comprises a controller element 22, shown as controller 
elements 22A through 22M (individually and collectively referred to as controller elements 
22), a processing element 24, shown as processing elements 24A through 24M (individually 
and collectively referred to as processing elements 24) and dedicated interconnect 29 to 
transfer control and data information between the controller element 22 and the processing 
element 24. Many digital signal-processing algorithms use similar arithmetic functions that 
are repeated many times. For example, algorithms to implement of digital filters, Fast Fourier 
Transforms (FFTs), convolution, correlation and discrete cosine transforms (DCTs) require a 
Multiply and Accumulate (MAC) operation to be performed many times on data samples. 
Consequently, a rigid processing element implementing a MAC type operation can then be 
used to implement these different digital signal-processing functions. The operation of the 
dedicated resource block 20B will be described in more detail later with reference to figure 
10 as a specific example of a dedicated resource. 
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The processing elements 23A through 23N in figure 5 can be rigid circuits, such as 
full custom logic and standard cell logic employed in ASICs, to implement one of a plurality 
of fixed functions. These functions include arithmetic functions (both fixed point and floating 
point), logical functions, logarithm conversion, anti-logarithm conversion, shifters, 
comparators, memory, combinatorial logic, finite state machines and polynomial finite state 
machines. In addition, each processing element 22 within a shared processing block 20A can 
have different bit widths. They can also implement the same function. For example, due to 
the computational requirement of the shared resource a processing block may contain four 
processing elements 22 hardwired as 16x16 bit multipliers, two processing elements 22 
hardwired as logical elements, a processing element 22 hardwired as a logical element and a 
processing element 22 hardwired as a shifter. 

Controller elements 22 are implemented using rigid logic or programmable-rigid 
resources, such as a micro-coded controller. A specific example is described later and shown 
as block 501 in figure 10. Controller elements can be implemented in different ways and can 
be used to control the reconfigurable interconnect 21 directly allowing different 
interconnection configurations. In a preferred embodiment the number of controller elements 
22 is greater then the number of processing elements 23 for a particular shared processing 
block 20A. The controller elements 22 being clocked at a lower frequency than the 
processing elements 23. This arrangement will allow the different processing elements 23 to 
be multiplexed or shared by the different controller elements 22 without there being any 
perceived processing delays. The clock frequency of the processing elements 23 should be at 
least n times faster than that applied to the individual controller elements, where n is equal to 
the number of controller elements 22. A controller element 22 can also control the 
configuration of the reconfigurable interconnect 21. 

Figure 7 shows a particular implementation of a processing element 23 used in the 
shared processing block 20A. This particular function is an arithmetic-logic processing 
element 300. The arithmetic logic unit (ALU) 301 has two inputs A and B connected to de- 
multiplexers 303 and 304 respectively. Each de-multiplexer 303 and 304 has N-l source 
memories connected to it, where N is the number of controller elements 22 in the same 
processing block 20A. Figure 7 shows two distinct groups of source memories, source 
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memories A 308, labelled OA through (n-l)A, and source memories B 309, labelled OB 
through (n-l)B. In a preferred embodiment of the invention, source memory A and source 
memory B work as a paired group based on a common de-multiplexer select index. However, 
different source A and source B memories can be used as inputs to the ALU 301. Data output 
from the ALU 301 can be transferred to one of a plurality of destination memories 310. 
Status information 302 generated as a result of each ALU 301 operation is output to the 
reconfigurable interconnect 21. This data can then be read by any of the n-1 controller 
elements 22. 

Control of the ALU, source memory selection and destination memory selection is 
performed by signals output from the pipeline register 311. This register 320 is divided into 
several fields as shown in figure 8 with each field controlling a particular portion of the 
processing element 23. As outlined above, several controller elements 22 can share a group 
of common resources 23. To do so the processing elements need to operate at a higher 
frequency than the controller elements. In certain circumstances a controller will be operating 
at a lower frequency. For example, an input interface that receives data serially will convert it 
to a parallel format before processing and transferring the data internally. If the word length 
is 16-bits then a controller will wait 16 clock cycles before processing and transferring the 
data. Also, interfaces can employ flow control signals and so an interface may have to wait 
an integer number of clock cycles before new data is received. This therefore allows 
resources normally used by a controller to be shared by other controllers. 

To access a processing element 23 each controller element 22 needs to make a request 
to that particular processing element. However, if only one controller element is used then the 
access circuitry is not required. In a preferred embodiment, as shown in figure 7, a register 
307 is provided to store a request from each controller element 22. Each register 307 is 
connected to its corresponding controller element 22 via interconnections 29. Control unit 
306 transfer each request word from registers 307 to the FIFO 305 on a round robin basis. If 
there is no request data for a particular controller then no data is transferred to the FIFO 305. 
If the FIFO 305 is empty as there are no requests then the associated circuitry, including the 
ALU, is not clocked (effectively turned off) to reduce power consumption. The control unit 
306 transfers request data from the register 307 to the FIFO 305 at a frequency of at least N 
times the clock frequency used by the controller elements 22, where N is the number of 
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controller elements 22 in a particular processing block 20A. The ALU, source memory read 
and destination memory write operations also operate at this higher frequency. If the FIFO 
305 is not empty the control unit 306 reads the next FIFO location and transfers the stored 
request data to the pipeline register 311. 

Field 324 of the request word is the Controller / Source Select Identifier. This field 
has several uses. It identifies the controller element 22 that made the request so result data 
can be returned to the appropriate source e.g. status information. In the preferred embodiment 
source memory A 308 and source memory B 309 are associated with each controller element 
22. Therefore, field 324 can be used to select the source memory pairs. The function field 
323 is used to select the desired ALU 301 function. Field 322 is the Operation Identifier. This 
effectively acts as a timestamp and can be used by the controller element 22 to synchronize 
the sequence of operations if several have been scheduled. This method of operation allows 
greater throughput and saves the controller element waiting for the return of each result from 
a processing element 23. Field 321 is the Repeat Field. A controller element 22 may wish to 
perform the same operation on a sequence of data. Instead of making several separate 
requests, the controller can make one request, which is then repeated several times. The 
number of repeat operations is determined by the Repeat Field 321 and used by the control 
unit 306 to implement the repeat operations. 

As outlined above the sharing of the processing elements 23 does not have to be on 
round robin basis. Other methods of sharing the processing elements 23 can be employed. 
These are referred to as statistical multiplexing of the shared resources. One method of 
statistically multiplexing the processing elements 23 is to use a weighted allocation, such as 
that described above using the repeat field 321. Another method (not shown) is to employ a 
request / grant scheme where shared resources are provided on a first-come first-served basis. 
An extension to this method is to use a priority based request / grant scheme. The type of 
scheme employed will depend on the system and algorithms being implemented. The amount 
of statistical multiplexing can be determined from simulation of a particular system prior to 
implementing it in a Reconfigurable Resource Core 10. 

Figure 8 showed a particular request word format 320 as applied to the ALU 
processing element 23. The processing elements 23 can implement different fixed functions. 
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Consequently, the request word format 320 for these will be different to that shown in figure 
8. For example, a processing element 23 may implement a multiplier. Therefore, there does 
not need to be a Function field 323 as it is implicit what the operation is. 

Though figure7 illustrates a simple ALU, a processing element 23 can be configured 
to implement a set of sub-functions, such as a multiply and accumulate function used for 
implementing digital filters, Fast Fourier Transforms, Inverse Fast Fourier Transforms, 
Discrete Cosine Transforms (DCTs), correlation and convolution functions for example. 

In another preferred embodiment uniform rigid hardware processing elements 23 can 
be concatenated to form wider operand word widths. For example, two 8-bit ALUs can be 
concatenated to form a 16-bit ALU. The routing of data signals, such as carry-in and carry- 
out signals, required for the larger configuration being routed via the reconfigurable 
interconnect 21. In addition, dedicated routing can be used and selected using multiplexers 
(not shown). The two processing elements 23 being controlled by a single controller element 
22. 

Figure 9 shows another implementation of a processing element 23. In this particular 
example the processing element implements a general-purpose reconfigurable finite state 
machine 400. However, the register portions can be bypassed so it can be used as a general- 
purpose combinatorial logic element. Data is input and output to the processing element 400 
using reconfigurable interconnect 21. As described later, the selection of the input and output 
signals can be implemented using pass transistor and or multiplexers and de-multiplexers (not 
shown in figure 9). Reconfigurable Logic Array 401 is an array of programmable-rigid 
combinatorial logic gates, such as and gates, or gates, nand gates, nor gates, exclusive or 
gates and invertors, whose function is determined by the Test Condition Select Vector 410. In 
yet another embodiment of the invention the reconfigurable logic array 401 can employ 
multiplexers and or look-up tables to implement combinatorial logic functions. 

Outputs 414 from the Reconfigurable Logic Array 401 are passed to the priority 
encoders 401 and 402. The output from priority encoder 401 forms part of the next address 
405. It is also used to enable priority encoder 402. This architecture provides an efficient 
implementation for multi-level "if-then-else" routines used in C/C++, VHDL and Verilog 
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languages. It also makes for easy finite state machine synthesis and design compilers. 
Though only one priority encoder 402 is shown more can be used for more complex 
combinatorial logic. Vectors 411 and 412 output from priority encoders 402 and 403 
respectively are combined to form the Next Address vector 405. This is used as the address 
input to the next state memory 404. The output 406 from the next state memory 404 is stored 
on the next clock cycle in output register 407. The output register is divided into several 
separate fields. Field 408 represents the current state vector and is input to the 
Reconfigurable Logic Array 401. Field 409 provides output signals that are set depending on 
the current state. 

To maximise logic utilization and sharing of resources the general purpose 
reconfigurable finite state machine 400 can be multiplexed in time to implement several 
finite state machines. In this configuration a controller element 22 is used to select and 
schedule the execution of each next state calculation for each finite state machine. The next 
state memory 404 contains the state vectors for each state of the different finite state 
machines. The various state vectors for a particular finite state machine are grouped together 
in memory. An address offset field 415 is provided by the controller element 22 to allow 
addressing of the different finite state machine groups in memory 404. Once calculated, the 
current state vector for each finite state machine is stored in an output register 407, shown as 
407 A through 4071 in figure 9. Each current state output register 407 has an enable signal 
416, shown as 416A through 4161, which is used by the controller element 22 to dynamically 
select and load the corresponding output register 407. 

In another embodiment, the shared processing resource elements 23 can be multiple 
instances of the same function, such as a multiplier. This configuration is useful for parallel 
processing applications where the same operation is applied multiple times. This allows one 
controller element 22 to access and use many processing elements 23 simultaneously. The 
reconfigurable interconnect 21 also allows the output from one processing element 23 to be 
input to any other processing element 23. This allows many processing elements to be 
concatenated in different ways to form different datapath and hence algorithmic functions. 
The reconfiguring of the different processing element 23 concatenation configurations can be 
changed on a cycle-by-cycle basis under the control of either a reconfigurable interconnect 
controller 25 or controller element 22. 



- 16- 



This arrangement is shown in figures 19 and figure 22. Figure 17 shows a section of 
pseudo code for implementing part of the AC3 exponent decoding function. Figure 18 show 
the data flow graph for implementing this code in stage[i]. Figure 20 shows a section of 
pseudo code for implementing subsequent part of the AC3 exponent decoding function once 
the previous function has completed. The Figure 21 show the data flow graph for 
implementing this code in stage [i+1]. In stage[i] the reconfigurable interconnect 21 of a 
particular processing block 2 is configured so the various processing elements are 
concatenated to implement the data flow graph shown in figure 18. This configuration is 
shown in figure 19. The configuration can be implemented for many clock cycles using 
different input data at each clock cycle. Once stage[i] has completed the stage[i+l] 
configuration can be implemented by reconfiguring the reconfigurable interconnect 21. This 
is shown in figure 22. This allows the next set of functions to be implemented on the required 
input data. By concatenating various processing elements many functions can be performed 
in parallel and in one clock cycle. 

A dedicated resource 26 comprises a controller element 22 and a processing element 
24. The processing element 24 can implement one or a plurality of different algorithms or 
functions and can contain more than one rigid processing resource. Figure 10 shows a logical 
block diagram of a particular form of dedicated resource 26 configured as a MAC processor 
500. In this particular configuration the controller element 22 is shown as a specific 
controller element 501 and the processing element 24 as a specific processing element 502. 
The controller element 501 is a programmable-rigid hardwired resource. It is a micro-coded 
controller. Micro-code instructions used to implement and perform functions, sub-functions 
and algorithms are stored in the micro-code memory 520. The address of the next 
microinstruction is generated by the micro-code controller 510. The output from the micro- 
code memory 520 is stored in the pipeline register 530 on the next clock cycle if the enable 
signal 531 is valid. The output of the pipeline register 530 in divided into fields, each of 
which is used to control circuitry in both the micro-code controller 510 and the processing 
element 502. 

The micro-code memory 520 can store a sequence of microinstructions to perform 
one task or function or several groups of microinstructions used to implement several tasks or 
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sub-functions. The contents of a micro-code memory 520 can be changed dynamically either 
in real-time or non real time by a master controller 16. This technique allows dynamic 
sharing of the available resources and gives more efficient logic utilization. Consequently, 
the same controller element 22 can implement and perform many different algorithmic 
functions. Depending of the overall system functionality, different micro-code memory 520 
used in each controller element 22 can be dynamically reconfigured at different rates. For 
example, controller elements 22 used to implement input and output interfaces only need to 
be configured at system initialisation or system reset. These types of functions don't normally 
change during device operation. Alternatively, the micro-code memory 520 can be loaded 
many times per second with a new sequence of microinstructions so the associated controller 
element 22 can implement many different functions. This method allows the same rigid 
hardware elements to be reconfigured in real time and non real time. Consequently, the same 
reconfigurable integrated circuit can be used in many different applications, such as audio, 
video, data processing and telecommunication protocol processing. It also allows an 
application employing a reconfigurable integrated circuit to implement new standards, 
upgrades and new applications. Hence, bringing an end to built-in obsolescence. The output 
from a pipeline register 530 can be routed to several processing elements 23,24 having the 
same function. This then provides means for implementing a Single Instruction Multiple Data 
(SIMD) type architecture. Having different controller elements 22 controlling different 
processing elements 23,24 provides means for implementing Multiple Instruction Multiple 
Data (MIMD) type architecture. 

The next micro-code memory address is selected from one of several sources. The 
selected address is output via the de-multiplexer 511. At reset or initialisation the start 
address register 515 is selected. For sequential microinstructions the source of the next 
address is from the incrementer 514 which increments the current address by 1 each clock 
cycle. The micro-code controller can jump to a non-contiguous address in the micro-code 
memory 520 by selecting the branch address 532 output from the pipeline register 530. The 
decision to perform the branch instruction can be conditional or non-conditional. For 
conditional branches the micro-code controller 510 tests a selected condition using the 
condition test logic 512. The inputs to the condition test logic 512 come for the ALU status 
logic 559 in this particular example. For some algorithms the same instruction needs to be 
repeated a number of times. To achieve this a repeat count register 513 is used. This register 
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is loaded with a repeat value 533 from the pipeline register 530. To reduce the width of the 
pipeline register it is possible to multiplex the repeat field 533 and branch address 532 
outputs. When a microinstruction is being repeated the pipeline register 530 is inhibited from 
being clocked by the enable signal 531. 

The processing element 502 in figure 10 implements a multiply-accumulate function. 
This can be used for implementing digital filters, Fast Fourier transforms, Inverse Fast 
Fourier Transforms (IFFTs), discrete cosine transforms, periodic and non-periodic waveform 
generation, correlation and convolution functions for example. Apart from the memories used 
in 502 the other circuitry can be hardwired. The multiplier 557 can perform fixed and or 
floating-point calculations. It takes its inputs form a data memory 554 and a coefficient 
memory 555. The coefficient memory 555 has a dedicated incrementer 556, which is 
incremented every clock cycle under the control pipeline register 530. The inputs to the data 
memory 554 and coefficient memory 555 are via the reconfigurable interconnect 21. Output 
data is also transferred to other processing resources via the reconfigurable interconnect 21. 
The output of the multiplier 557 can be latched using register 558. The output of the register 
558 is input to the ALU 560 together with the output of the register 561. Though only one 
ALU output register 561 is shown, several can be provided and selectively input to the ALU 
560. Selection of the ALU function is determined by the pipeline register field 538. 

The data memory 554 address is generated using dedicated logic. Similar logic can 
also be used to address the coefficient memory 555 and is indicated in figure 10 by signals 
562. As several algorithms my be being used, a register file 550 is provided to hold the start 
addresses for each set of data. The register file 550 location address is provided by the 
pipeline register field 534. The data memory address is stored in register 553 and is 
calculated by the address ALU 552. The inputs to the address ALU 552 come from the 
register file 550 and the de-multiplexer 551. Address ALU function and data input selection 
are determined by the pipeline register fields 536 and 535 respectively. 

As the pipeline register 530 controls several circuit blocks many processing actions 
can be performed in parallel and a greater throughput can be achieved (hardware 
acceleration). In another embodiment, the dedicated processing resource elements 502 can be 
multiple instances of the same function, such as a multiplier. This configuration is useful for 
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parallel processing applications where the same operation is applied multiple times. This 
allows one controller element 501 to access and use many processing elements 502 
simultaneously. 

The processing blocks 2 may be grouped and interconnected in different ways to form 
different device architectures. Both shared resource processing blocks 20A and dedicated 
resource processing blocks 20B may be freely mixed and replicated to form architectures 
consisting of 10s, 100s or 1000s of blocks. Figure 11 shows how both shared resource 
processing blocks 20A and dedicated resource processing blocks 20B may combined to form 
a hierarchical network of processing blocks. These processing blocks 2 communicate via the 
reconfigurable interconnect 21. The actual routing of signals between the processing blocks 
is controlled by the reconfigurable interconnect controllers 25. In the hierarchical architecture 
the outer processing blocks 2 will tend to be the shared resource processing blocks 20A and 
used to implement interface functions, for example. Whereas the inner processing blocks 2 
will tend to be the dedicated resource processing blocks 20B used to perform processor 
intensive calculations. 

In another embodiment, the processing blocks 2 may be grouped as four units, for 
example, having local reconfigurable interconnect 21 and a reconfigurable interconnect 
controller 25. This sub-group can then be replicated many times to form a fractal type 
architecture as shown in figure 12. 

The master controller 16 initialises the reconfigurable integrated circuit at start-up or 
reset. It has access to each of the reconfigurable interconnect memories 251 and micro-code 
memories 520 of each of the controller elements 23. Communication between the master 
controller and memories is via the reconfigurable interconnect 21. In a preferred 
embodiment, the communication between the master controller and memories 251, 520 is via 
a dedicated system bus 28. Configuration data used to implement different algorithmic 
functions and configure the routing between elements can be stored locally in the global 
memory 15. It can also be stored in external memory (not shown) and transferred to the 
selected internal configuration memories 251,520 by the master controller 16. Data my be 
written to a reconfigurable memory and read from a reconfigurable memory by the master 
controller 16. 
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There are different ways to implement the reconfigurable interconnect 21 and route 
signals to different processing blocks 2, elements 22/23/24, global memory 15 and the master 
controller 16. Figure 13 outlines one method. The individual signal line RIO through RIn-1 of 
the reconfigurable interconnect 21 can be connected to the individual signal lines EIO through 
EIn-1 of a controller or processing element 2, 20A, 20B, 22A -22N, 23A - 23M, 26A - 26M 
using pass transistors 270 through 27(n-l). Each pass transistor's gate is connected to a bit 
register 252 in a reconfigurable interconnect controller 25. Each input signal and output 
signal from either a controller element 22 or processing element 23 can be connected to one 
or more of the reconfigurable signals RIO through RIn-1 (not shown). In addition, each input 
signal or output signal from either a controller element 22 or processing element 23 can be 
hardwired to the reconfigurable signals RIO through RIn-1 (not shown) to reduce circuitry. In 
another embodiment a group of pass transistor's gates can be controlled by a single bit from 
the pipeline register 252. Different routing configurations can be selected and are stored in 
the connection memory 251. By addressing different memory locations in the connection 
memory 251 and loading the output register 252 with different routing configuration data, 
different signal routing can be changed in real time (for example, on a cycle-by-cycle basis). 
The updating and accessing of the connection memory 251 is performed by either the master 
controller 16 via the dedicated interconnect 28 or can be performed locally by a controller 
element 23 via reconfigurable interconnect means 21 or dedicated connection means 253 and 
253a. 

As outlined above, in a preferred embodiment (not shown) the control of the pass 
transistor's gates, which control the reconfigurable, interconnect 21 can be controlled locally 
from the output of a controller element 23. In essence the reconfigurable interconnect 
controller 25 is integrated into a controller element 23. 

Pass transistors can also be used to isolate signals to a particular group of processing 
blocks or elements. Figure 14 shows such a scheme. Individual signal lines RIO through Rin- 
1 of the reconfigurable interconnect 21 have a pass transistor 280 through 28(n-l) in series 
with each signal line respectively. The gates of the pass transistor 280 through 28(n-l) are 
connected to individual bits of the register 252a of a reconfigurable interconnect controller 
25a. Different routing configurations can be selected and are stored in the connection 
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memory 251a. By addressing the connection memory signal routing can be changed in real 
time. The updating of the connection memory 251 is performed by either the master 
controller 16 via the dedicated interconnect 28 or can be performed locally by a controller 
element 23 via reconfigurable interconnect means 21 or dedicated connection means 253 and 
253a. 

In addition to employing pass transistors for routing of signals, a processing block 2, 
controller element 22, processing element 23,24 and reconfigurable interconnect controller 
25 can contain de-multiplexer elements which are used to select one signal from a group of 
input signals. Likewise, signals may be output to the reconfigurable interconnection 21 and 
dedicated interconnect 28,29 using multiplexers. These routing methods are illustrated in 
figure 5 for both a controller element 22 and a processing element 23. Specific examples are 
shown for controller element 22B and processing element 23B. A group of input 
reconfigurable interconnect signals 21 A are connected to de-multiplexer 220. Any of the 
input signals 21 A can be routed to input signal 222 by applying the appropriated select code 
to the de-multiplexer select lines 224. Control of the de-multiplexer select lines 224 coming 
from either a controller element's pipeline register 530 or a reconfigurable interconnect 
controller's output register 252. A controller element output signal 223 can be multiplexed 
onto one of a group of output reconfigurable signals 21B using a multiplexer 221. Control of 
the multiplexer select lines 225 coming from either a controller element's pipeline register 
530 or a reconfigurable interconnect controller's output register 252. 

For a shared resource element, such as 23B, a group of input reconfigurable 
interconnect signals 21C are connected to de-multiplexer 230. Any of the input signals 21C 
can be routed to input signal 232 by applying the appropriated select code to the de- 
multiplexer select lines 234. Control of the de-multiplexer select lines 234 coming from a 
reconfigurable interconnect controller's output register 252. A processing element output 
signal 233 can be multiplexed onto one of a group of output reconfigurable signals 2 ID using 
a multiplexer 231. Control of the multiplexer select lines 235 coming from either a controller 
element's pipeline register 530 or a reconfigurable interconnect controller's output register 
252. 
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Routing of input and output to and from a dedicated processing element 26 is 
illustrated in figure 6 with reference to dedicated processing element 26B. A group of input 
reconflgurable interconnect signals 2 IE are connected to de-multiplexer 260. Any of the 
input signals 2 IE can be routed to input signal 262 by applying the appropriated select code 
to the de-multiplexer select lines 264. Control of the de-multiplexer select lines 264 coming 
from either a controller element's pipeline register 530 or a reconflgurable interconnect 
controller's output register 252. A dedicated element output signal 263 can be multiplexed 
onto one of a group of output reconflgurable signals 21F using a multiplexer 261. Control of 
the multiplexer select lines 265, 267, 268 coming from either a controller element's pipeline 
register 530 or a reconflgurable interconnect controller's output register 252. 

In another preferred embodiment, programmable-rigid hardwired resources are 
employed. One type of programmable-rigid hardwired resource is a reconflgurable multi-tap 
finite state machine 600. 

Figure 15 shows how four smaller multi-tap finite state machines 601 through 603 
can be connected to form a larger multi-tap finite state machine 600. The output of the 
previous smaller multi-tap finite state machine being connected to the input of the next 
smaller multi-tap finite state machine. For example, output signal 606 of smaller multi-tap 
finite state machine 601 is connected to the input of smaller multi-tap finite state machine 
602. De-multiplexer 605 is used to select the outputs 606 through 609 of the smaller multi- 
tap finite state machines 601 through 603 respectively. The selected serial data appearing on 
output 611. 

Figure 1 6 shows the logic used to implement two stages of a smaller multi-tap finite 
state machine. Each bit stage includes a 1-bit register 710, 720 to store each input bit when 
clocked with clock signal 701. These registers can also be reset using the reset signal 702 or 
preloaded using signal 703. De-multiplexer 712 is used to select either the tap line input 606, 
607, 608 or 609 or the output form the next stage's exclusive or gate 721. Output selection is 
determined by the signal line 714 and can be driven from a controller element 22. The output 
of the de-multiplexer 712 is input to the exclusive or gate 713 of this stage of the smaller 
multi-tap finite state machine. The other input to the exclusive or gate 713 is the data input 
610 as this is the first stage of the smaller multi-tap finite state machine. For each subsequent 
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stage of the smaller multi-tap finite state machine the input to the exclusive or gate is the 
output from the previous stage. The output of the exclusive or gate of a particular stage is 
input into a de-multiplexer. For stage 0 the output from exclusive or gate 713 is input to de- 
multiplexer 711. The other input to the de-multiplexer is the output from the previous stage 
or the data in input 610 if it is the first stage. Output selection is determined by the signal line 
715 and can be driven from a controller element 22. By controlling the outputs of the two de- 
multiplexers used in each stage of these reconfigurable resources can be used to implement a 
wide range of serial dividers, serial multipliers, Linear Feedback Shift Registers (LFSRs), 
Cyclic Redundancy Checkers (CRCs) and cyclic coders. This is particularly useful when 
implementing different interfaces and protocols. 

In yet another preferred embodiment one or a plurality of controller elements 22 and 
processing elements 23,24 can be configured to implement test circuitry to check the 
operation of the various controller elements 22, processing elements 23, 24 and 
reconfigurable interconnection controllers 25. If any of the latter circuit elements are found to 
be operating incorrectly these fault conditions can be reported to a master controller 16 so 
they are not included in the implementation of live operational circuits. 

Although the invention has been described herein with reference to particular 
preferred embodiments, it is to be understood that these embodiments are illustrative of the 
aspects of the invention. As such, a person skilled in the art may make numerous 
modifications to the illustrative embodiments described herein. Such modifications and other 
arrangements which may be devised to implement the invention should not be deemed as 
departing from the spirit and scope of the invention as described and claimed herein. 
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